Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jan 20, 2019

What changes were proposed in this pull request?

  • Avoid using Timestamp.valueOf and Date.valueOf in parsing TimestampType and DateType literal values in Literal.fromString since the method uses the hybrid calendar (Julian+Gregorian) internally.
  • Replace the methods above by stringToDate and stringToTimestamp because they have been already ported on Proleptic Gregorian calendar which is required by SQL standard.
  • Reuse Literal.fromString from AstBuilder in parsing Timestamp and Date literal values.

How was this patch tested?

The changes were tested by ExpressionParserSuite and LiteralExpressionSuite.

checkEvaluation(Literal.fromString("Databricks", StringType), "Databricks")
val dateString = "1970-01-01"
checkEvaluation(Literal.fromString(dateString, DateType), java.sql.Date.valueOf(dateString))
val timestampString = "0000-01-01 00:00:00"
Copy link
Member Author

@MaxGekk MaxGekk Jan 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The year 0000 did not exist in our current era, and it is considered as an invalid year by java.time implementations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean a behavior change introduced by SPARK-26178, SPARK-26243, SPARK-26424 in 3.0.0? I'm expecting that you already documented that before changing this test case.

it is considered as an invalid year by java.time implementations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it is bug fix. I guess old (current) implementation wasn't strong enough in checking its input.
Year zero does not exists in current (common) era (CE) and before common era (BCE) as well (see https://en.wikipedia.org/wiki/Year_zero).

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine but let me leave it to @hvanhovell

def fromString(str: String, dataType: DataType): Literal = {
def parse[T](f: UTF8String => Option[T]): T = {
f(UTF8String.fromString(str)).getOrElse {
throw new AnalysisException(s"Cannot parse the ${dataType.catalogString} value: $str")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we just throw a ParseException here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks this can be called somewhere else not by the parser .. in that case, parse exception might not make much sense. It maybe has to be illigal argument exception or runtime exception as well.

/**
* Constructs a Literal from a String
*/
def fromString(str: String, dataType: DataType): Literal = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait actually, is it ever properly called somewhere previously? If not, we could just remove it. Since all classes in catalyst are considered an internal API, we could just remove if that's not called.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan I remember you mentioned this method can be used somewhere like logging. If not, I will remove it. Also cc @gatorsmile as you added tests for the method recently: #22345

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an explicit plan to use this somewhere? If so, that's okay.

Otherwise, let's remove. In general, we shouldn't keep the codes for internal purpose that are not being called. Removed codes remaining in the commit and other branches - we can always revive them when it's actually used or in order to deduplicate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove it now. It was there for TreeNode.fromJSON, which has been removed long time ago.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. Here is the PR to remove the methods: #23603

@SparkQA
Copy link

SparkQA commented Jan 20, 2019

Test build #101441 has finished for PR 23596 at commit 9623acf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk MaxGekk changed the title [SPARK-26652] Use Proleptic Gregorian Calendar in Literal.fromString [SPARK-26652][SQL] Use Proleptic Gregorian Calendar in Literal.fromString Jan 21, 2019
case "TIMESTAMP" =>
val timeZone = getTimeZone(SQLConf.get.sessionLocalTimeZone)
toLiteral(stringToTimestamp(_, timeZone), TimestampType)
case "DATE" => toLiteral(DateType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just do Cast(Literal(str), DateType)? Or Literal.create(Cast(...).eval, DateType)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that in another PR but @hvanhovell doesn't like that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks the current way is cleaner (see 016c696). I'm not sure where he said that in the PR though.

@asfgit asfgit closed this in 4c1cd80 Jan 21, 2019
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

The `fromString` and `fromJSON` methods of the `Literal` object are removed because they are not used.

Closes apache#23596

Closes apache#23603 from MaxGekk/remove-literal-fromstring.

Authored-by: Maxim Gekk <maxim.gekk@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
@MaxGekk MaxGekk deleted the gregorian-literals-from-strings branch August 17, 2019 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants